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Abstract — In the recent work of Candes et al, the problem of 
recovering low rank matrix corrupted by i.i.d. sparse outliers is 
studied and a very elegant solution, principal component pursuit, 
is proposed. It is motivated as a tool for video surveillance 
applications with the background image sequence forming the 
low rank part and the moving objects/persons/abnormalities 
forming the sparse part. Each image frame is treated as a column 
vector of the data matrix made up of a low rank matrix and a 
sparse corruption matrix. Principal component pursuit solves 
the problem under the assumptions that the singular vectors of 
the low rank matrix are spread out and the sparsity pattern of 
the sparse matrix is uniformly random. However, in practice, 
usually the sparsity pattern and the signal values of the sparse 
part (moving persons/objects) change in a correlated fashion 
over time, for e.g., the object moves slowly and/or with roughly 
constant velocity. This will often result in a low rank sparse 
matrix. 

For video surveillance applications, it would be much more 
useful to have a real-time solution. In this work, we study the 
online version of the above problem and propose a solution 
that automatically handles correlated sparse outliers. In fact we 
also discuss how we can potentially use the correlation to our 
advantage in future work. The key idea of this work is as follows. 
Given an initial estimate of the principal directions of the low 
rank part, we causally keep estimating the sparse part at each 
time by solving a noisy compressive sensing type problem. The 
principal directions of the low rank part are updated every-so- 
often. In between two update times, if new Principal Components' 
directions appear, the "noise" seen by the Compressive Sensing 
step may increase. This problem is solved, in part, by utilizing the 
time correlation model of the low rank part. We call the proposed 
solution "Real-time Robust Principal Components' Pursuit". It 
still requires the singular vectors of the low rank part to be 
spread out, but it does not require i.i.d.-ness of either the sparse 
part or the low rank part. 



I. Introduction 

Principal Components' Analysis (PCA) tries to find the 
"principal components' space" with the smallest dimension 
that spans a given dataset. In practice, data is noisy and in 
this case PCA finds the smallest subspace to represent the 
dataset with a given mean squared error (MSE) tolerance. 

Given a low rank data matrix M G r™x« (each column 
of Al is one data vector), PCA finds its principal components 
(PCs) as the left singular vectors of Al that have nonzero 
singular values. This is the same as first estimating the data 
covariance as (l/n)MM T , computing its eigenvalue decom- 
position (EVD) and retaining eigenvectors corresponding to 
nonzero eigenvalues. When data is noisy, this is replaced by 
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arranging the eigenvectors in decreasing order of eigenvalues, 
and retaining the smallest number of eigenvectors so that 
the sum of the remaining eigenvalues (which is equal to the 
residual MSE) is less than the MSE tolerance. 

When the noise is small, the above approach works well. 
However, covariance matrix estimation, and hence the corre- 
sponding EVD, are sensitive to even a few large outliers in 
the data. Unfortunately, in practice these do occur, e.g. when 
trying to compute the principal components' subspace for a 
video sequence, parts of it may get occluded by other moving 
objects. There has been a large amount of work in literature on 
"Robust PCA", e.g. 0, 0, 0, i), 0, 0, Q, 0, 0, most 
of which either assumes the locations of the missing/corruped 
data points are known 0, which is not a practical assumption, 
or (ii) first tries to detect the corrupted pixels and then either 
fills in the corrupted location using some heuristics or (iii) 
often just removes the entire outlier vector. In a series of 
recent works IflOl , IfTTI . lfT2l . a very elegant solution to this 
problem was provided that treats the outlier as a sparse vector. 
In IflOl , the data matrix Al consists of a low rank matrix that 
is corrupted by sparse outliers, i.e. 

M = L + S 

where L is a low rank matrix having a singular value decom- 
position (SVD) L S = D UDV T and S is sparse and can have 
arbitrary large magnitude. Let ||£||* denotes the nuclear norm 
of L, i.e., the sum of singular values of L. It is shown in 
IflOl that L and S can be recovered with high probability by 
solving a convex optimization problem, named as Principal 
Component Pursuit (PCP), 

min ||L||,+A||S||i (1) 

subject to L + S = Al 

provided the singular vectors of L are spread out (not sparse), 
the support and signs of S are uniformly random (thus not low 
rank), the rank of L and the fraction of corrupted entries in S 
are both sufficient small. A more recent work, fT21 . extends the 
result of IflOl showing that, with a proper weighting parameter 
A, PCP can recover L and S with high probability even if the 
size of support set of S is large, as long as the rank of L is 
small enough. But it requires that 5 has random support and 
random signs. 

PCP IflOll is motivated as a tool for surveillance applications, 
with the background variations approximately lying in a low 
dimension subspace, and the sparse part being the "moving 
persons" or "abnormalities" to be detected. It is an offline 
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method which treats each image frame as a column vector of 
the data matrix AI. While this is a very elegant and novel idea, 
there are certain limitations. 

1) In surveillance, it would be more useful to obtain the 
estimates of the sparse part on-the-fly rather than offline. 

2) The sparsity pattern (support and signs) of the sparse 
part may change slowly or in a correlated fashion, which 
may result in a low rank sparse matrix. In this case, PCP 
assumption will not get satisfied and as a result it will 
not work, e.g. see Figff] 

3) The principal directions (set of eigenvectors correspond- 
ing to nonzero eigenvalues) can change over time. So the 
rank of the matrix L will keep increasing over time thus 
making PCP impossible to do after sometime. 

This last issue may get resolved by not using all frames of 
M, but only the latest image frames. But the first two issues 
still remain. 

In this paper, we propose an online approach to solve this 
problem. Our goal is to causally keep estimating the sparse 
part St at each time, and to keep updating the principal 
directions every-so-often. The t-th column of M, M t , is the 
data acquired at time t. It can be split as 



M t = L t + S t = [U I] 



Xi 



(2) 



where x t := U T L t and the matrix U is an unknown m x 
m orthonormal matrix. The support of the vector St changes 
slowly over time. Given an initial estimate of P t := (U)iy t , 
denoted P t , we solve for the sparse vector St by first finding 
the orthogonal complement matrix P t j_ and then using the 
projection of Mt onto Pt,±, denoted by y t , 



P?±St- 



to solve for St- Notice that if P t ~ Pt the first term 
will be close to zero and can be treated as "noise". When 
Pt 7^ Pt (new directions added), the "noise" can be reduced 
by using the time correlation model on L t . Furthermore, recent 
estimates of L t := M t — St are stored and used to periodically 
update P t as described in Sec. IIII-DI There are also some 
limitations of our method. 

1 ) We need an approximately accurate initial estimate of the 
PCs' basis, Po, which is easy to get using training data 
without sparse corruptions. 

2) The orthogonal complement P t ,±_ needs to satisfy some 
conditions for Compressive Sensing to succeed. 

3) An appropriate choice of constraint parameter e is needed 
for estimating St- 

The above idea is somewhat related to that of iTPTl in that 
both try to cancel the "message" signal and only solve for the 
sparse "error" signal, but with the big difference that in fT3), 
P t is known. Other related work which also uses P t known 
is |[T4l . |Q3]. However in our problem P t is unknown and 
can change with time. Out method requires the columns of 
Pt ± be spread out (not sparse), but it does not require St to 
have independent nonzero entries. In fact, we can utilize the 
correlated support change of St over time, t, to our advantage 



in future work. Since we update the principal directions on-the- 
fly, the dimension of the principal subspace remains bounded. 

A model similar to (f2]i but for a static problem and with U 
being a known matrix, was introduced in lfl6l . ifPTll . A method, 
termed as pursuit of justice (PJ), is introduced to solve for the 
sparse vector u — [x t ,St] T which solving the following i\ 
minimization problem 



mm « i 

u 

subject to Mt = 



(3) 



Au 



where A := [U I}. Notice that in our problem U is unknown, 
and thus we cannot use sparse reconstruction techniques to 
find Xt. Given an estimate Pt, the above can be modified to 
A = [P P t ± I] . However, this does not work as shown in 
Fig. [3] 

A. Notations 

The set operations U, D and \ have the usual meanings. For 
any set T C {1, • • • m}, T c denotes the complement set of T, 
i.e., T c := {l,---m}\T. 

For a non diagonal matrix A, we let Ai denote the zth 
column of A and we let At denote a matrix composed of 
the columns of A indexed by T. For two set T\ and Ti, we 
let At x ,t 2 denote a submatrix of A consisting of the rows 
indexed by T\ and columns indexed by T-i- For a diagonal 
matrix Q, Qt denotes a submatrix of Q consisting of the rows 
and columns indexed by T. In other words, Qt is a diagonal 
matrix with (Qt)jJ = (Q)r 3 -,T r 

For vector v, Vi denotes the ith entry of v and vt denotes 
a vector consisting of the entries of v indexed by T. \\v\\k 
denotes the 1^ norm of v. The support of v, supp(w), is the 
set of indices at which v has nonzero value, supp(w) := {i : 
Vi + 0}. 

We use to denote an empty set or an empty matrix. 

II. Problem definition and Signal Model 

The tth column of M, M t 6 R mxl , is the data at time t 
which can be split as 



M t 



Lt + S t 
Ux t = Pta t 



where xt '■= U T Lt and St are sparse vectors with slowly 
changing support Nt := supp(xt) and Tt := supp(St), 
respectively. Nt is modeled as being piecewise constant with 
time. The vector at := (xt)N t is the none-zero part of x t . The 
principal components' basis at each time t, P t := U^ t , is a 
submatrix of U whose columns span the principal components' 
subspace at time t. It is unknown and can change over time. 

Since the matrix U does not change with time (in this work), 
the only way P t changes is when the set N t changes. This 
happens every d frames. We assume that xt and hence L t = 
Uxt follows a piecewise stationary model with nonstationary 
transients when switching pieces. For every d frames, there 
are some supporting indices get added or deleted from Nt. 
Specifically when an element i gets added into the support, it 
gets added with an initial small variance Oaf (with < 9 < 1) 
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and then at future times follows a first order autoregressive 
(AR-1) model with AR parameter / and stable variance of. 
Recall that an AR-1 model is asymptotically stationary. Thus, 
after the initial transient period, x t is stationary until the next 
support change time. Before an element i gets deleted, it starts 
decaying as (x t )i = fd(xt-i)i, with < f d < f < 1, and 
soon decays to zero. 

A. Mathematical description of signal model for x t ( and hence 
for Lt) 

The support set of x t , N t , is a union of three disjoint sets 
A t , D u and E u i.e., N t = A t U D t U E t . The addition set 
A t := Nt\Nt-i is the set of indices for the new appearing 
eigenvectors (U)A t - The set D t C (Nt PI N t -i) is the set of 
indices of those eigenvectors whose eigenvalues are decreasing 
at time t. The set E t := JV t fl jVh \ D t is the set of indices 
for existing eigenvectors with non-decreasing eigenvalues. The 
sets D t and A f can be empty. For any time r with "decreasing" 
set D T , we assume that D T will not get added to Nt for any 
t > T. 

Let S = diag(of ), i = 1, • • ■ , m, be a diagonal matrix with 
non-increasing positive diagonal elements, i.e. erf satisfying 
of > We model xt as 

xo = 0, N = 

it = ^tst-i + ^t, ^ 4, ~ ' <V(0, Q t ) (4) 
where and Q t are two diagonal matrices defined as below 

(Ft)A t =0, (Q t ) At =9(E) At , 

(F t ) Et = fI, {Qt) Et = {l- f){^)E t , 

(F t ) Dt = fd, (Qt) Dt = 0, 

(F t ) N . = 0, (Q t )jv= = 0. 

where /, fd, and 9 are scalars satisfying < fd < f < 1 and 
<0 < 1. 

From the model on xt, we notice the following: 

a) At time t = r, (x t )a t starts with 

(x t )a t ~Af(O,0(£) A J. 

Small ensures that new directions gets added at a small 
value and increase slowly. (x t )d t decays as 

(x t )d t = fd{x T -l)D T 

(xt)e t follows an AR-1 model with parameter /: 

(x t )e t = f(x T ~i)E T + (v t )e t 

b) At time t > r, if A T is not removed from the support 
set, the variance of [xt)A T gradually increases as 

(x t )i ~ Af(0, (l - (l - 0)/ 2( *- T) )S M ), i e A T 

Eventually, the variance of (xt)A T converges to (S)a t - 
For example, with / = 0.9 and 9 = 0.4, the variance of 
(xt)A T gets to 0.9(E)a t in 18 frames. 

c) At time t> r, the variance of (xt)D r decays as 

(xtK~Af(0,/f- T) (£K) 

Eventually, [xt)D T decays to zero. For example, with 
fd = 0.1, the variance of (xt)D T decrease to 
0.0001(S)r, T in 2 frames. 



B. Model for S t 

Recall that in video surveillance applications, the data 
matrix M is obtained by stacking each image frame as a 
column vector, whose low rank component L corresponds to 
background variation lying in a low rank subspace and sparse 
component S captures the moving objects in the foreground. In 
this work, we use a simple model for the sparse component 
S modeling the activity of the moving objects as described 
below. 

We assume that, in each image frame, there are k (k > 1) 
objects in the foreground. Each object occupies a 3 x 3 pixel 
block which has nonzero pixel values. All other pixels in the 
foreground have zero values. Let CG\ denote the coordinate of 
the center of gravity of the ith object at time t, i = 1, • • • , k. 
For the next image frame, each CG\ can either be static with 
probability p or move one step to the left/right/top/bottom with 
probability (1 — p)/4 each, i.e., for i = 1, ■ • • , k, 



' CG\_ 




with probability p 




CGI 


-i + (l,0), 


with probabilty (1 - 


-p)/i 


CGI 


-x + (-l,0), 


with probability (1 


-p)/4 


CG\ 


-i + (0,l), 


with probability (1 


-p)/4 


k CGI 


-i + (0,-l), 


with probability (1 


-p)/4 



with p = 0.8. The pixels in each block move accordingly. 
Except if the objects move very fast or if they are very small, 
there will be overlap between their regions from frame to 
frame. We then stack the resulting foreground image frame 
as columns of S. Clearly, the support of St, tth column of S, 
is time correlated and the signs of these nonzero entries are 
fixed. This is quite different from fTOl and 1 1 2| where random 
support and random signs are assumed on the sparse part S. 

III. Real-time Robust PCP 

An overview of our method, real-time robust PCP (RR- 
PCP), is shown in Fig[TJ We first discuss the approach to 
recursively reconstruct the sparse component St- Next, we 
discussed the way we track the changes of the principal 
directions. Finally, a complete algorithm is given in Algorithm 

12 

A. RR-PCP: recursively reconstruction of the sparse part St 

Using P t , which is an estimate of principal components P t 
at time t, we can rewrite L t and M t as 

L t - P t at + Pt,±p\ 
M t = P t at + Pt,±/3t + St 

where P t .±_ is an orthogonal complement of P t \ at '■= L t 
is the projection of L t onto the subspace spanned by P t ; and 
(3t := P^±Lt is the projection of L t onto the subspace spanned 
by Pt,j_- Notice that Pt is an estimate of Pt. It is either just a 
slight rotation of Pt with span(P t ) = span(P t ) or there may 
be some missing and extra principal directions. The column 
vectors of P t ± are the eigenvectors spanning the null space 
of P t T . The orthogonal complement P t ,±_ is not unique. 
Let 

Vt :=PZ ± M t =pT x St+/3 t (5) 
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If there is no missing principal direction, i.e., span(P t ) C 
span(p), fit = 0. If there are missing principal directions, 
span(P t ) ^ span(p) and f3 t ^ 0. In this case, (3 t in (O is the 
"noise" resulting from the estimation error of current principal 
directions. This now becomes a noisy sparse reconstruction 
problem, with "noise" f3 t - When \\PtW2 is not ver Y large, we 
can causally recover St by solving 

minHslli s.t. \\P? ± (M t - s)\\l < e (6) 

s ' 

and hence estimate L t as 

L t = M t - S t 

where e is a parameter with some small positive value. 

For the case of missing principal directions, span(P t ) ^ 
span(p). Let tS trn i ss := span(P f ) \ span(p) denote the 
"missing" subspace and let Pt,miss be its orthonormal basis 
matrix. Thus span(P t , miss ) = <S tl mi SS and 

span(P t ) = span(P) © span(P ijrniss ) (7) 
span(P tj± ) = span(P t ^) ffi span(P^ miss ) (8) 



Therefore, 



0i 



Pt,X^t — Pt,±PtMssPt,miss-^t 



(9) 



with P t T miss ii being the projection of L t onto the subspace 
'St.miss- If -P/Tmiss-k* starts with small values, \\PtW2 shall be 
small and it can increase over time. When \\PtW2 is getting 
too large, © may give incorrect estimate St- Thus, we need 
to update p and get those missing directions detected. 



B. Canceling the "noise" using the time correlation of Xt 

It is expensive to update p and P t .± very frequently, 
especially for some real-time applications. But notice that we 
can cancel out some of (3t by using the model on x t from 
Sec lII-AI We modify © as 



min \\s\\ x s.t. \\P7M - s - /£ t -i)||| < e 



(10) 



Let f3 t -i := P t T j_-L t -i. Note that in <|T0j, the "noise" is fi t - 
fPt-i while in the "noise" is f3 t . 

Next, we discuss an example showing that the "noise" f3 t — 
fPt-i in ( [Tol l is smaller than the "noise" f3 t in ©. 

Suppose at time t = r — 1, we have an exact estimate 
of all principal directions, P T _i = P T -i- At time t = r, a 
support change occurs with N T = N T -i U A T and D T = 0. 
The principal directions at time t = r are P T = [P T -i, Pa t ] 
where Pa t = Ua t are the new added principal directions. 
However, this change is unknown to us and we just use P r = 
P T _i = P T _i. Therefore, 



span(P r ) = span(P T _i' 
Thus, at time t > r, 



span(P A ^ 



P^Lt = pT(Pr-iP?-iLt + PtA^Lt) 



Pl^PtA.PUUxt 



(11) 



Assuming L t -\ is correctly recovered, i.e., L t -\ 
fa-i = fk-i, then 



Lt-i 



fit ~ fPt- 



P 



_P,ArPt T A, 



U 



(^)a t 





Let B 



E( 



= P^PiaM)^ 
-- (Pr,±PtAr) T (Pr : ±Pt,Ar)^ then at time t > r, 
Edl^Hi) = E B M (1-(1-W 2( *- T) K? 

/A-illl) = s p m (i-/ 2 K 2 (12) 



Clearly, E(||/3 t |||) shall be much larger than E(||/3 t - 



fPt 



-ill!) 



For example, with / = 0.9, 9 = 0.4, at t = t ■ 



E(||/3t|||) is 0.514 E P M of while E( 



r 2. 



ieA T 

and the ratio of E( 



Still) 



- /A-ill!) 

and E{\\/3 t 



1. 

is 



0.19 E 

ffit-iW^) keeps increasing over time. Recall that (3 t and 
Pt — fPt-i are the "noise" in © and (fTOb . therefore, assuming 
we have an accurate estimate of principal basis at time r — 1, 
( fTOl shall give more accurate estimate of St than ©. We show 
a plot of the expectations of ||/?t||| and ||/3 f — f j3 t -i ||| in Fig. 

El 

In (|6]l and dlOK we need an appropriate parameter e which 
should be proportional to the "noise" term \\PtW2 m © or 



IIA-//5 t -i|llin m- The "noise" 



Hand 



-ill! 



also changes over time. Thus, we shall set the e adaptively. In 
our work, we use e proportionally to 2||/3t_i||| for © and e 
proportional to 2\\p t -i - //3t-2||i f° r <TT0t . 

If the constraint is too tight (e is too small), © or ( TTOb may 
give solutions with some small nonzero values outside the true 
support set T t (also verified by our numerical experiments). 
As first done in |[T8l , we can also do support thresholding 
followed by least square estimation to reduce these errors, i.e., 
we can solve 



Tt 
(St) ft 

(S t )fc 



(5 t ) 4 > 7 } (13) 

((PL)f t )Hyt-fPLL t -i) (14) 


M t - S t (15) 



C. Using model on St 

Currently, we do not use the model on St- A simple way 
to use it is to do modified-CS |[T9l as 



||* T . ||i s.t. \\P?AM t 



fLt-i 



< € 



(16) 



with Tp re d being an estimate of the support of St- As |[T9l 
shows, if Tpred is an approximately correct estimate of cur- 
rently support, ( TToT l should improve the performance of (fTOl . 

In previous work |20l . |2TI . we used the previous support 
estimate, Tt-i, as r pie d- This is sufficient for the problems 
considered in lEUl . where support changes very slowly over 
time, e.g. in case of wavelet coefficients of a medical image 
sequences. But for our current problem, even with one or 
two pixel motion between frames, the support change will be 
significant and T f _i will have large error w.r.t. T t . A better 
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Fig. 1: Real-time Robust PCP 



solution, is to use the motion model to predict the object(s)' 
location in the next frame and use this prediction to obtain 
T pre( j at time t. The details of how to do this, especially for 
multiple objects, will be worked out in future work. 

D. RR-PCP: Recursively estimating the low rank part 

When some new principal directions appear, we need to 
detect these directions timely before the "noise" gets large. 
Now, E(||/3t — //3|||) in © seems not increase with time. But 
this assumes Lt-i = L t -\ which is not true. When some ex- 
isting directions vanish, they also need to be removed from P t . 
Otherwise, the number of estimated principal directions keeps 
increasing and thus the number of columns in Pt ±, which is 
the number of measurements for ( [Tol l, keeps decreasing. 

At initial time, we have the training data L° := 
[Li, ■■■ , L to ], which contains no sparse component. Ac- 
cording to our signal model @, the data sequence L t is 
time correlated. Thus, we need a long sequence's data to 
get an accurate estimate of it's covariance. But notice in our 
model, the sequence L t — fLt-i is time independent and 
has same eigenvectors as L t . Thus, we estimate principal 
directions of L° by estimating the covariance of L t — /£t-i 
and computing its EVD. Let Pn and Go be the eigenvec- 
tors and (non-zero) eigenvalues of the covariance matrix of 
Lt-fLt-i, t = 2,- • -t , i.e. PnGnP T = L°(L°) T . Let F stable 
be an orthonormal matrix whose columns are the correctly es- 
timated eigenvectors and let G sta bie be a diagonal matrix whose 
diagonal elements are the correspondingly correctly estimated 
eigenvalues. Let P t = P sta bie = Po and let G t = G sta bie = G . 

Our PCs update procedure is designed to estimate the 
current principal directions for data generated according to 
the piecewise stationary model on xt (and hence on L t ) that 
was described in Sec. III-AW Assume that every d frames, 
k new directions get added or removed or both from the 
PCs' subspace. The newly added directions' variance starts 
at a small value and slowly stabilizes to the stable value. 
For deleted directions, we set (vt)i = immediately and we 
replace / by fd < f (ensures quicker decay). 

'in future work, we will analyze real data and study existing literature 
to come up with more realistic models and the corresponding PCs update 
algorithms. For example, in practice U may not be fixed, but may also rotate 
gradually over time. 



Consider a change time, t = r. Let P new := (U) 



N T \N T -! 



be 



the matrix containing the k newly added directions. Our PCs 
update algorithm assumes the following, 

1) The previous additions are detected and correctly esti- 
mated before a new set gets added. 

2) Let the data matrix D contain Td frames of L t — fL t -\ af- 
ter the new directions have been added. Then span(P new ) 
is contained in the span of the data, span(D). 

Assumption 1) holds approximately if d is large enough. 
Assumption 2) holds with high probability if >> k. 

We split the estimate of PCs' basis, P t , into two parts, 
Pt = [Ptabie, Pew] where P sta bie is the "stable" (correctly 
estimated) set of principal directions and P new are the new 
ones which are still being rotated and corrected. We would 
like to compute an initial estimate of P new as soon as possible 
(using only a few frames after \\f3 t \\ 2 exceeds a threshold). Say 
we use Td frames and let the matrix D contains L t -\ — fLt-\ 
for these frames. We can compute an initial estimate of the 
new directions, P new , by computing the principal directions of 
the sample covariance matrix of (/ — PtabiePstabie) T P- This 
is done by step 1 .b) of Algorithm Q] By assumption 2), if 
Td > k, then we would have found the correct span, i.e. 
span(P new ) 2 span(P new ). But notice that without enough 
data, even though span(P new ) contains span(P new ), it will 
typically contain many extra directions. As more data comes 
in, we keep rotating P new every-so-often until variances along 
some directions become approximately zero and these get 
thresholded out. Once this has happened, the estimated rotation 
matrix P along the existing directions becomes close to 
identity and remains this way. This is the time we can add 
P new into P s t a bie- This is done by step l.c) of Algorithm Q] 

When the variances along some directions in P sta bie begin 
to decrease and eventually decay to zero, we compute the 
variance of last Td e i frames along P st abie and then remove 
directions with small variance from P st abie- This is done by 
step 2) of Algorithm Q] 

The above PCs update procedure is summarized in Algo- 
rithm Q] In Algorithm Q] D and Pdei are data matrix to store 
the data difference L t — fL t -\. The parameters, Td, r r , and 
Tdei, are the length of each data piece we use to detect new 
directions, to rotate and correct newly added directions, and 
to remove decayed directions, respectively. We use two small 
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Algorithm 1 Updating P t 

1) Detect new appearing directions 

a) If status = stable, compute HA-iHl : = llA-i ±-^t-i||i- 
If ||/3t-i||| > S, set status <— detection and store data in 
D, i.e., D [D, Lt-i — JLt-2\- If not, keep status = 
stable, go to step 2). 

b) If status = detection, 

- If there are less than Td frames in D, keep storing 
data difference in D, i.e., D = [D, L t -\ — fL t -2\- 

- If there are Td data frames in D, compute K = (I — 

Pstable -Psmble)^- 

* Estimate P new by computing the EVD of KK T , 
i.e, 

L KK T BVPp G pT 
Td 

T d = {%, G iti > £d} 

Pnew ! G new ^Td 

where G is a square matrix and Gr d is a submatrix 
of G consisting the rows and columns indexed by 
T d - 

* Let D = 0. 

* If -Pnew = 0, set status •(— stable and set I = 0. If 
Pew 7^ 0, set status rotation and set I = Td- 

- Go to step 2) 

c) If status = rotation, 

- If there are less than r r frames in D, keep storing 
data difference in D, i.e., D = [D, Lt-\ — fLt-2\. 

- If there are r r data frames in D, let K = P^D. 

* Rotate P new and G new using K, i.e., 

T— " {IGnew + KK T ) EV = D PGP T 

I + T r 

T r = {i : Gi,i > ^r} 

-Piew = (PnewP)T,.: G new = (G)t t 

If P is approximately an identity matrix, 

let P s table [Pstable , Pnew] ; G s t a ble 

[Gstabie, Gnew]- Set status <— stable, and let P new = 
0> G new = 0, Z = 0. If not keep status < — rotation 
and let I = I + Td. 

* D = 0. 

- Go to step 2). 

2) Remove decayed directions from Pstable- 

• If there are less than Td e i data in Z?deh keep store data 
difference in D del , i.e., D de i = [D dei , L t _i - /P t - 2 ]- 

• If there T de i data in Z?dei, detect decayed directions as 
follows 

- Find T del := {i : ^(P sta ble)f P ) delP ) d r el(Pstable) 4 < 
0.05(G stab le)M}. 

- Remove (Pstable)^, from P sta bie and remove 

(G st able)T del from Gstable- 

- Set D del = 0. 

3) Let P t = [Pstable, Pnew]- 



thresholds, ^ and £ r , to detect new directions P new in step lb) 
and to threshold extra directions out from P new in step lc). 
They are proportional to the total variance along all existing 
stable directions. 

E. A complete algorithm 

The complete algorithm of real-time robust PCP is given in 
Algorithm El 

In Algorithm 12 we compute Pt.±, orthogonal complement 
of P t or equivalently the null space basis of P t T , using the QR 
decomposition of P t l22l . Suppose P t is m x r matrix with 
m» r. We find an m x m orthonormal matrix H such that 

P t *=" HJ=[H 1 H 2 ] P 1 

where J\ is an r x r upper triangular matrix. H\ consists of 
the first r columns of H and H-2 is made up of the last m — r 
columns. The columns of H-2 span the null space of P t T and 
we let P t> ± = Hi- 



Algorithm 2 Real-time Robust PCP (noise canceled) 
Training: Given training data Lq = [L\,--- ,L to ], estimate 
principal components of Lq by computing the eigen-pairs of 
the sample covariance of L t — j Lt-\- Let Pq and Go denote the 
eigenvectors and eigenvalues. Set P sta bie = Po, G sta bie = Go- 
At time t = to, 

. Set status = stable. Let P t = P stab i e , G t = G stab i e , P new = 
0, G new = 0. 

. Let D = 0, I = 0, L> de i = 0. 
FOR t > t , do the following: 

1) Estimate PCs' subspace of low rank part using Algorithm 
Q]and compute Pt,±- 

2) Estimate sparse part, St, by solving ( TTOb with e = 

3) Support thresholding and least square estimation: do ( Tf3l , 
Gil, and dBJ. 

4) Increment t by 1 and go to step 1). 



IV. Experiment results 

We simulated Lt 6 R 128xl using the model described in 
Sec III- Al The first to = 5 x 10 3 frames contains no sparse 
part and we use it as training data. The sparse vector, x t , 
follows a AR-1 model with parameter / = 0.9. There are 
32 principal directions with variances ranging from 1 x 10 4 
to 9. Recall that in our model, L t is time correlated and the 
sequence L t — fL t -\ has same eigenvectors as L t , we get 
initial estimate of PCs' subspace by estimating the covariance 
of L t — fLt-x and computing its EVD. 

The sparse component St £ ]R 128xl first arises at time 
t = to + 1. The nonzero entries of St has positive magnitude 5, 
which is usually much smaller than magnitude of the nonzero 
entries of x t . For t > to + 1, the support of St changes 
following the model described in Sec lII-BI resulting in a low 
rank matrix S. 

At time t = to + 5, we add one new direction P new with 
variance 50 to PCs' basis and let it starts at a small value 
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- RR-PCP (noise cancele 

- RR-PCP (basic) 
-PJ 
-PCP 




5000 5010 5020 5030 5040 5050 5060 5070 5080 5090 5100 5110 5120 5130 5140 5150 

time -» 

Fig. 2: Comparison of RR-PCP (noise canceled), RR-PCP (basic), PJ and PCP 




5000 5010 5020 5030 5040 5050 5060 5070 5080 5090 5100 5110 5120 5130 5140 515 

time -> 



Fig. 3: Monte Carlo averaging for three on-line methods, RR-PCP (noise canceled), RR-PCP (basic), and PJ 



with = 0.4. It slowly stabilizes to the variance 50. At time 
t = to + 100, one existing direction (not P new ) begins to decay 
with U = 0.1. 

We do RR-PCP (noise canceled), RR-PCP (basic), PJ and 
PCP using the data generated as described above and plot the 
percentage error in Fig. __ The percentage error is defined as 



percentage error := 



\S t -St 

wsth 



In Fig. __ Si-.t is 1024 x t dimensional at time t, but its rank 
ranges from to 51. 

For RR-PCP (noise canceled), we do algorithm [2] with = 
i~r = Tdei = 20. At t = to + 6, it detects the appearance of 
new directions and set status = detection. At t = to + 26, 
a new piece of data containing frames are available, RR- 
PCP do step lb) of Algorithm Q] and get Pnew, an estimate of 
the new direction P new . There are 7 new directions in P ne w, 
and the coherence between these new estimated directions and 
the true one P new ranges from 0.9393 to 0.0051. So, with 
frames of data, it approximately finds a subspace containing 
P new and some extra directions. For t > to + 26, we do step 
lc) of Algorithm Q] to rotate P new closer to the true one and 
threshold out those extra directions for every r r = 20 new 
frames of data. For example, at t = to + 47 when a new piece 
of data is available, we do step lc) which rotates P new closer 
to the true P new and get 2 directions thresholded out. The 
maximum coherence of P new and P new goes up to 0.9505. At 
time t = t +68, another two directions are thresholded out and 
the maximum coherence of P new and P new goes up to 0.9526; 
at time t = to + 110, the rotation matrix P is close to an 
identity matrix (on-diagonal elements larger than 0.9999 and 



off-diagonal elements smaller than 0.01). Only one direction 
is left in P new > with coherence 0.9553 to P nev/ . It sets status = 
stable and adds P new to the stable set of principal directions. 
At time t = to + 126, it removes the deleted direction from 
the estimated PCs' basis successfully. 

For RR-PCP (basic), we do same thing as RR-PCP (noise 
cancelled) but replace ( TTOb with __) and replace ( fT4b by 
doing LS on y t . We see that error of RR-PCP (basic) is 
larger than RR-PCP (noise cancelled) because it does not use 
the information contained in L t -i, i.e. \\(3 t \\ 2 is larger than 
||A-//3 t -i!| 2 (seeFig|4]>. 

For PJ, we solve __> with A = [p, P,_l, Pj. PJ recovers xt 
and S t while RR-PCP (noise cancelled) and RR-PCP (basic) 
cancel the term xt by P^j_. Recall that x t has variance ranging 
from 1 x 10 4 to 9, the magnitude of x t is much larger than St- 
PJ recovers the significant part Xt and cannot get St recovered 
correctly. 

For the off-line method PCP, at each time t, we solve (Q]i 
using all available data framefl [Mi, • • • , M t ], and plot the 
error for current frame St. The error of PCP is large because 
the support of St is time correlated and S t does not has random 
signs. To implement PCP in a causal fashion, it requires about 
200 - 300 seconds at every time t, while RR-PCP takes about 
1.7 seconds at every time t. 

We do 50 times Monte Carlo simulation for three on-line 
methods, average the percentage error and plot them in Fig. 
|3] As can be seen from FigfJ] our method RR-PCP (noise 
canceled), gives the smallest error. In Fig|4] we plot the 

2 We use Accelerated Proximal Gradient method with code available at 
http://perception.csl.illinois.edu/matrix-rank/sample_code.html 
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5000 5020 5040 5060 5080 5100 5120 51405150 
time -» 



Fig. 4: E||A||i v.s. E||A-/A-i||| 

expectation of := \\P£ ± L t \\l and ||/3 t - fp t -i\\l := 

\\pTj_(L t - /L t -i)||| for RR-PCP (noise canceled). It shows 
that, for t < t + 26 when there are some missing principal 
directions, ||/3 t — /A-ill! is much smaller than For 
£ > to + 26, RR-PCP (noise canceled) gets an estimate P new 
and adds it to the estimate of current PCs' basis, thus, both 
IIAHl and ||A-/&-i|la decreases. However, 
is still slightly less than \\PtW2 due to the time correlated model 
on L t . Recall that WfltWi is me noise in © and ||/3 t — ffit-i ||| 
is the noise in dTOb . that is the reason why RR-PCP (noise 
canceled) is better than RR-PCP (basic). 

V. Discussion and future work 

In this work, we used a simple motion model on the sparse 
vector St as explained in Sec. IH-BI Under this model, the 
support of St changes slowly over time, resulting in a low 
rank matrix S. Because of this, PCP is unable to distinguish 
S from the low rank L. But our method, RR-PCP, works 
because it does not require the sparse matrix S to to be 
uniformly random. In this work, we have not utilized the 
correlated support change of St to our advantage. But, in 
fact, RR-PCP can be improved significantly by using this 
knowledge and by adapting the modified-CS idea of |fl9l , ll20l 
to incorporate motion prediction. We can use the knowledge of 
the object's motion model and the previous support estimate 
to obtain the current support prediction T pie< i of the sparse 
part. If this prediction is accurate and is used in JTol l, with an 
appropriately chosen e, the reconstruction error should reduce 
significantly, especially when the support size of St is large. 
In future work, we will develop realistic motion models and 
corresponding motion prediction algorithms to get reliable 
support predictions of the sparse part. We will also analyze 
their performance, first assuming P t is perfectly known and 
later for the practical case of P t unknown. 

Our PCs updating procedure is designed for the data gener- 
ated according to the piecewise stationary model on x t while 
U is a constant but unknown orthonormal matrix. In future 
work, we will analyze real data and study existing literature 
to come up with more realistic models and the corresponding 
PC update algorithms. 

For very-large scale data, it is computationally and memory 
intensive to compute Pt.±- In future work, we will develop 
computational efficient alternatives. For example we can use 
the fact that ||P f T ± z|| 2 = \\Pt,±P^ ± z\\ 2 = - P t P?)z\\ 2 . 



A somewhat related work is Jin-Rao's approach lfT4l which 
solves 

min||s||i s.t. \\M t - P t a - s\\\ < e (17) 

a,s 

In lfl4l . the matrix Pt is a known and fixed regression 
coefficients' matrix, which is no longer true in our problem. 
We can use the time correlated model on x t (and hence on 
at) and the motiom model on St to modify (fTTI i following a 
similar way of RR-PCP. 
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